Hyperbolic Behavior Of Occupation Measures Between Neighboring Policies In CMDPs
نویسنده
چکیده
We study the change in the steady state probabilities as the controller of a Markov Decision Process (MDP) shifts from one deterministic policy to another by gradually changing the selected action in a single state. We prove that the steady state probability for each state-action pair is a hyperbolic Möbius transformation. In particular, this implies that the change is monotone. The same holds also for the cost function in the discounted cost and expected average cost models. We extend this result also to constrained MDPs with an arbitrary number of constraints.
منابع مشابه
Constrained Markov decision processes with total cost criteria: Occupation measures and primal LP
This paper is the third in a series on constrained Markov decision processes (CMDPs) with a countable state space and unbounded cost. In the previous papers we studied the expected average and the discounted cost. We analyze in this paper the total cost criterion. We study the properties of the set of occupation measures achieved by diierent classes of policies; we then focus on stationary poli...
متن کاملA Strongly Polynomial Algorithm for Controlled Queues
We consider the problem of computing optimal policies of finite-state finite-action Markov decision processes (MDPs). A reduction to a continuum of constrained MDPs (CMDPs) is presented such that the optimal policies for these CMDPs constitute a path in a graph defined over the deterministic policies. This path contains, in particular, an optimal policy of the original MDP. We present an algori...
متن کاملهمسویی رویکرد قانونگذاری ایران با مقررات بینالمللی در حوزه حقوق مالکیت ادبی و هنری
In literary and artistic property law, according to the principle of independence of sovereign governments, each nation, according to local conditions and requirements and considered policies, regulatory for the protection of creators of literary and artistic works and of holders of neighboring rights. However, differences in national laws due to differences in approaches, policies and his deve...
متن کاملPrediction of foundations behavior by a stress level based hyperbolic soil model and the ZEL method
In shallow foundations, the third bearing capacity factor, N, has been found to show a decreasing tendency with increasing the foundation size. It is supported by experimental observations and related mainly to stress level dependent nature of the soil. On the other hand, the bearing capacity is often obtained theoretically without consideration of the foundation vertical displacements. In thi...
متن کاملMotion Planning with Safety Constraints and High-Level Task Specifications
The formalism of linear temporal logic (LTL) [2] is increasingly being used to express task specifications in robotics, automation, and manufacturing. Its expressiveness, coupled with its ease of use, makes it suitable for numerous scenarios. LTL alone, however, just expresses temporal relationships and misses the ability to model the unavoidable uncertainty emerging in interactions with the ph...
متن کامل